Matin Takhtesangi. washback
Washback ''' Simply speaking wash back is the direct impact of testing on individuals. Hughes (1989) defines wash back as “the effect of testing on teaching and learning, and believes that testing can have either a beneficial or harmful effect on teaching and learning. This impact of test is at two levels, a micro level and a macro level. '''Micro level: in micro level, the test affect on the individuals. There are two kinds of individuals: test takers and teachers. I'. Impact on test takers: '''the test takers can be affected by three aspects of testing procedures: ' ' 1. The experience of taking and preparing for the test. 2. The feedback they receive about their performance on the test. 3. The decisions that may be made about them on basis of their test score. '''II.' Impact on teachers: 'most teachers are familiar with the amount of influence testing can have on their instruction. They can administer a test to the students and measure the ability of the students and find the needs of them by that test and then adjust their method of instruction with the needs of the students. '''Macro level: '''in macro level, the test effect on society and educational system. For example by testing the students and understanding their needs the educational system can decide whether use the multiple choice questions in educational system or not. ' 'WASHBACK: THE DEFINITION AND ORIGIN ' Although washback is a term commonly used in applied linguistics today, it is rarely found in dictionaries. However, the word backwash ''can be found in certain dictionaries and is defined as “the unwelcome repercussions of some social action” by the ''New Webster’s Comprehensive Dictionary, and “unpleasant after-effects of an event or situation” by the Collins Cobuild Dictionary. The negative connotations of these two definitions are interesting, as they inadvertently touch on some of the negative responses and reactions to the relationships between teaching and testing, which we explore in more detail shortly. Washback ''(Alderson & Wall, 1993) or ''backwash ''(Biggs, 1995, 1996) here refers to the influence of testing on teaching and learning. The concept is rooted in the notion that tests or examinations can and should drive teaching, and hence learning, and is also referred to as ''measurement-driven instruction (Popham, 1987). In order to achieve this goal, a “match” or an overlap between the content and format of the test or the examination and the content and format of the curriculum (or “curriculum surrogate” such as the textbook) is encouraged. This is referred to as curriculum alignment ''by Shepard (1990, 1991b, 1992, 1993). Although the idea of alignment—matching the test and the curriculum—has been descried by some as “unethical,” and threatening the validity of the test (Haladyna, Nolen, & Haas, 1991), such alignment is evident in a number of countries, for example, Hong Kong (see Cheng, 1998a; Stecher, Barron, Chun, Krop, & Ross, 2000). This alignment, in which a new or revised examination is introduced into the education system with the aim of improving teaching and learning, is referred to as ''systemic validity ''by Frederiksen and Collins (1989), ''consequential validity ''by Messick (1989, 1992, 1994, 1996), and ''test impact ''by Bachman and Palmer (1996) and Baker (1991). '' Wall (1997) distinguished between test impact and test washback in terms of the scope of the effects. According to Wall, impact ''refers to “. . . any of the effects that a test may have on individuals, policies or practices, within the classroom, the school, the educational system or society as a whole” (see Stecher, Chun, & Barron, chap. 4, this volume), whereas ''washback (or backwash) is defined as “the effects of tests on teaching and learning” (Wall, 1997). Although different terms are preferred by different researchers, they all refer to different facets of the same phenomenon—the influence of testing on teaching and learning. The authors of this chapter have chosen to use the term washback, as it is the mostly commonly used in the field of applied linguistics. The study of washback has resulted in recent developments in language testing, and measurement-driven reform of instruction in general education. Research in language testing has centered on whether and how we assess the specific characteristics of a given group of test takers and whether and how we can incorporate such information into the ways in which we design language tests. One of the most important theoretical developments in language testing in the past 30 years has been the realization that a language test score represents a complex of multiple influences. Language test scores cannot be interpreted simplistically as an indicator of the particular language ability we think we are measuring. The scores are also affected by the characteristics and contents of the test tasks, the characteristics of the test takers, the strategies test takers employ in attempting to complete the test tasks, as well as the inferences we draw from the test results. These factors undoubtedly interact with each other. Nearly 20 years ago, Alderson (1986) identified washback ''as a distinct— and at that time emerging—area within language testing, to which we needed to turn our attention. Alderson (1986) discussed the “potentially powerful influence offsets” (p. 104) and argued for innovations in the language curriculum through innovations in language testing (also see Wall, 1996, 1997, 2000). At around the same time, Davies (1985) was asking whether tests should necessarily follow the curriculum, and suggested that perhaps tests ought to lead and influence the curriculum. Morrow (1986) extended the use of washback to include the notion of ''washback validity, which describes the relationship between testing, and teaching and learning. Morrow also claimed that “. . . in essence, an examination of washback validity would take testing researchers into the classroom in order to observe the effects of their tests in action”. This has important implications for test validity. Looking back, we can see that examinations have often been used as a means of control, and have been with us for a long time: a thousand years or more, if we include their use in Imperial China to select the highest officials of the land. Those examinations were probably the first civil service examinations ever developed. To avoid corruption, all essays in the Imperial Examination were marked anonymously, and the Emperor personally supervised the final stage of the examination. Although the goal of the examination was to select civil servants, its washback effect was to establish and control an educational program, as prospective mandarins set out to prepare themselves for the examination that would decide not only their personal fate but also influence the future of the Empire (Spolsky, 1995a, 1995b). The use of examinations to select for education and employment has also existed for a long time. Examinations were seen by some societies as ways to encourage the development of talent, to upgrade the performance of schools and colleges, and to counter to some degree, nepotism, favoritism, and even outright corruption in the allocation of scarce opportunities (Bray & Steward, 1998; Eckstein & Noah, 1992). If the initial spread of examinations can be traced back to such motives, the very same reasons appear to be as powerful today as ever they were. Linn (2000) classified the use of tests and assessments as key elements in relation to five waves of educational reform over the past 50 years: their tracking and selecting role in the 1950s; their program accountability role in the 1960s; minimum competency testing in the 1970s; school and district accountability in the 1980s; and the standards-based accountability systems in the 1990s (p. 4). Furthermore, it is clear that tests and assessments are continuing to play a crucial and critical role in education into the new millennium. In spite of this long and well-established place in educational history, the use of tests has, constantly, been subject to criticism. Nevertheless, tests continue to occupy a leading place in the educational policies and practices of a great many countries. These researchers, and others, have, over many years, documented the impact of testing on school and classroom practices, and on the personal and professional lives and experiences of principals, teachers, students, and other educational stakeholders. Aware of the power of tests, policymakers in many parts of the world continue to use them to manipulate their local educational systems, to control curricula and to impose (or promote) new textbooks and new teaching methods. Testing and assessment is “the darling of the policy-makers” (Madaus, 1985a, 1985b) despite the fact that they have been the focus of controversy for as long as they have existed. One reason for their longevity in the face of such criticism is that tests are viewed as the primary tools through which changes in the educational system can be introduced without having to change other educational components ''such as teacher training or curricula. Shohamy (1992) originally noted that “this phenomenon washback is the result of the strong authority of external testing and the major impact it has on the lives of test takers. One example of these beliefs about the legislative power and authority of tests was seen in 1994 in Canada, where a consortium of provincial ministers of education instituted a system of national achievement testing in the areas of reading, language arts, and science (Council of Ministers of Education, Canada, 1994). Most of the provinces now require students to pass centrally set school-leaving examinations as a condition of school graduation (Anderson, Muir, Bateson, Blackmore, & Rogers, 1990; Lock, 2001; Runte, 1998; Widen, O’Shea, & Pye, 1997). Petrie (1987) concluded that “it would not be too much of an exaggeration to say that evaluation and testing have become the engine for implementing educational policy” (p. 175). The extent to which this is true depends on the different contexts, as shown by those explored in this volume, but a number of recurring themes do emerge. Examinations of various kinds have been used for a very long time for many different purposes in many different places. There is a set of relationships, planned and unplanned, positive and negative, between teaching and testing. These two facts mean that, although washback has only been identified relatively recently, it is likely that washback effects have been occurring for an equally long time. It is also likely that these teaching–testing relationships are likely to become closer and more complex in the future. It is therefore essential that the education community work together to understand and evaluate the effects of the use of testing on all of the interconnected aspects of teaching and learning within different education systems. '''WASHBACK:' 'POSITIVE, NEGATIVE, NEITHER OR BOTH? ' Movement in a particular direction is an inherent part of the use of the washback metaphor to describe teaching–testing relationships. For example, Pearson (1988) stated that “public examinations influence the attitudes, behaviors, and motivation of teachers, learners and parents, and, because examinations often come at the end of a course, this influence is seen working in a backward direction—hence the term ‘washback’ ” (p. 98). However, like Davies (1985), Pearson believed that the direction in which washback actually works must be forward (i.e., testing leading teaching and learning). The potentially bidirectional nature of washback has been recognized by, for example, Messick (1996), who defined washback as the “extent to which a test influences language teachers and learners to do things they would not necessarily otherwise do that promote or inhibit ''added language learning. Wall and Alderson also noted that “tests can be powerful determiners, ''both positively and negatively, ''added of what happens in classrooms” (Alderson & Wall, 1993, Wall & Alderson, 1993). '' Messick (1996) went on to comment that some proponents have even maintained that a test’s validity should be appraised by the degree to which it manifests positive or negative washback, which is similar to Frederiksen and Collins’ (1989) notion of systemic validity. Underpinning the notion of direction is the issue of what it is that is being directed. Biggs (1995) used the term backwash ''(p. 12) to refer to the fact that testing drives not only the curriculum, but also the teaching methods and students’ approaches to learning (Crooks, 1988; Frederiksen, 1984; Frederiksen & Collins, 1989). However, Spolsky (1994) believed that “backwash is better applied only to accidental side-effects of examinations, and not to those effects intended when the first purpose of the examination is control of the curriculum” (p. 55). In an empirical study of an intended public examination change on classroom teaching in Hong Kong, Cheng (1997, 1998a) combined movement and motive, defining washback as “an intended direction and function of curriculum change, by means of a change of public examinations, on aspects of teaching and learning” (Cheng, 1997, p. 36). As Cheng’s study showed, when a public examination is used as a vehicle for an intended curriculum change, unintended and accidental side effects can also occur, that is, both negative and positive influence, as such change involves elaborate and extensive webs of interwoven causes and effects. Whether the effect of testing is deemed to be positive or negative should also depend on ''who ''it is that actually conducts the investigation within a particular education context, as well as ''where, the school or university contexts, when, the time and duration of using such assessment practices, why, the rationale, and how, the different approaches used by different participants within the context. If the potentially bidirectional nature of washback is accepted, and movement in a positive direction is accepted as the aim, the question then becomes methodological, that is, how to bring about this positive movement. After considering several definitions of washback, Bailey (1996) concluded that more empirical research needed to be carried out in order to document its exact nature and mechanisms, while also identifying “concerns about what constitutes both positive and negative washback, as well as about how to promote the former and inhibit the latter” (p. 259). According to Messick (1996), “for optimal positive washback there should be little, if any, difference between activities involved in learning the language and activities involved in preparing for the test. However, the lack of simple, one-to-one relationships in such complex systems was highlighted by Messick (1996): “A poor test may be associated with positive effects and a good test with negative effects because of other things that are done or not done in the education system” (p. 242). In terms of complexity and validity, Alderson and Wall (1993) argued that washback is “likely to be a complex phenomenon which cannot be related directly to a test’s validity” (p. 116). The washback effect should, therefore, refer to the effects of the test itself on aspects of teaching and learning. The fact that there are so many other forces operating within any education context, which also contribute to or ensure the washback effect on teaching and learning, has been demonstrated in several washback studies (e.g., Anderson et al., 1990; Cheng, 1998b, 1999; Herman, 1992; Madaus, 1988; Smith, 1991a, 1991b; Wall, 2000; Watanabe, 1996a; Widen et al., 1997). The key issue here is how those forces within a particular educational context can be teased out to understand the effects of testing in that environment, and how confident we can be in formulating hypotheses and drawing conclusions about the nature and the scope of the effects within broader educational contexts. 'Negative Washback ' Tests in general, and perhaps language tests in particular, are often criticized for their negative influence on teaching—so-called “negative washback”— which has long been identified as a potential problem. For example, nearly 50 years ago, Vernon (1956) claimed that teachers tended to ignore subjects and activities that did not contribute directly to passing the exam, and that examinations “distort the curriculum” (p. 166). Wiseman (1961) believed that paid coaching classes, which were intended for preparing students for exams, were not a good use of the time, because students were practicing exam techniques rather than language learning activities (p.159), and Davies (1968) believed that testing devices had become teaching devices; that teaching and learning was effectively being directed to past examination papers, making the educational experience narrow and uninteresting. More recently, Alderson and Wall (1993) referred to negative washback as the undesirable effect on teaching and learning of a particular test deemed to be “poor. Alderson and Wall’s poor ''here means “something that the teacher or learner does not wish to teach or learn.” The tests may well fail to reflect the learning principles or the course objectives to which they are supposedly related. In reality, teachers and learners may end up teaching and learning toward the test, regardless of whether or not they support the test or fully understand its rationale or aims. In general education, Fish (1988) found that teachers reacted negatively to pressure created by public displays of classroom scores, and also found that relatively inexperienced teachers felt greater anxiety and accountability pressure than experienced teachers, showing the influence of factors such as age and experience. Noble and Smith (1994a) also found that highstakes testing could affect teachers directly and negatively (p. 3), and that “teaching test-taking skills and drilling on multiple-choice worksheets is likely to boost the scores but unlikely to promote general understanding” (1994b). From an extensive qualitative study of the role of external testing in elementary schools in the United States, Smith (1991b) listed a number of damaging effects, as the “testing programs substantially reduce the time available for instruction, narrow curricular offerings and modes of instruction, and potentially reduce the capacities of teachers to teach content and to use methods and materials that are incompatible with standardized testing formats. This narrowing was not the only detrimental effect found in a Canadian study, in which Anderson et al. (1990) carried out a survey study investigating the impact of re-introducing final examinations at Grade 12 in British Columbia. The teachers in the study reported a narrowing to the topics the examination was most likely to include, and that students adopted more of a memorization approach, with reduced emphasis on critical thinking. In a more recent Canadian study (Widen et al., 1997), Grade 12 science teachers reported their belief that they had lost much of their discretion in curriculum decision making, and, therefore, much of their autonomy. When teachers believe they are being circumscribed and controlled by the examinations, and students’ focus is on what will be tested, teaching and learning are in danger of becoming limited and confined to those aspects of the subject and field of study that are testable (see also Calder, 1990, 1997). '''Positive Washback ' Like most areas of language testing, for each argument in favor or opposed to a particular position, there is a counterargument. There are, then, researchers who strongly believe that it is feasible and desirable to bring about beneficial changes in teaching by changing examinations, representing the “positive washback” scenario, which is closely related to “measurement- driven instruction” in general education. In this case, teachers and learners have a positive attitude toward the examination or test, and work willingly and collaboratively toward its objectives. For example, Heyneman (1987) claimed that many proponents of academic achievement testing view “coachability” not as a drawback, but rather as a virtue (p. 262), and Pearson (1988) argued for a mutually beneficial arrangement, in which “good tests will be more or less directly usable as teaching-learning activities. Similarly, good teaching-learning tasks will be more or less directly usable for testing purposes, even though practical or financial constraints limit the possibilities” (p. 107). Considering the complexity of teaching and learning and the many constraints other than those financial, such claims may sound somewhat idealistic, and even open to accusations of being rather simplistic. However, Davies (1985) maintained that “creative and innovative testing . . . can, quite successfully, attract to itself a syllabus change or a new syllabus which effectively makes it into an achievement test” (p. 8). In this case, the test no longer needs to be just an obedient servant. It can also be a leader. As the foregoing studies show, there are conflicting reactions toward positive and negative washback on teaching and learning, and no obvious consensus in the research community as to whether certain washback effects are positive or negative. As was discussed earlier, one reason for this is the potentially bidirectional nature of an exam or test, the positive or negative nature of which can be influenced by many contextual factors. According to Pearson (1988), a test’s washback effect will be negative if it fails to reflect the learning principles and course objectives to which the test supposedly relates, and it will be positive if the effects are beneficial and “encourage the whole range of desired changes” (p. 101). Alderson and Wall (1993), on the other hand, stressed that the quality of the washback effect might be independent of the quality of a test (pp. 117–118). Any test, good or bad, may result in beneficial or detrimental washback effects. It is possible that research into washback may benefit from turning its attention toward looking at the complex causes of such a phenomenon in teaching and learning, rather than focusing on deciding whether or not the effects can be classified as positive or negative. According to Alderson and Wall (1993), one way of doing this is to first investigate as thoroughly as possible the broad educational context in which an assessment is introduced, since other forces exist within the society and the education system that might prevent washback from appearing (p. 116). A potentially key societal factor is the political forces at work. As Heyneman (1987) put it: “Testing is a profession, but it is highly susceptible to political interference. To a large extent, the quality of tests relies on the ability of a test agency to pursue professional ends autonomous. If the consequences of a particular test for teaching and learning are to be evaluated, the educational context in which the test takes place needs to be fully understood. Whether the washback effect is positive or negative will largely depend on where and how it exists and manifests itself within a particular educational context, such as those studies explored in this volume. 'WASHBACK: FUNCTIONS AND MECHANISMS ' Traditionally, tests have come at the end of the teaching and learning process for evaluative purposes. However, with the widespread expansion and proliferation of high-stakes public examination systems, the direction seems to have been largely reversed. Testing can come first in the teaching and learning process. Particularly when tests are used as levers for change, new materials need to be designed to match the purposes of a new test, and school administrative and management staff, teachers, and students are generally required to learn to work in alternative ways, and often work harder, to achieve high scores on the test. In addition to these changes, many more changes in the teaching and learning context can occur as the result of a new test, although the consequences and effects may be independent of the original intentions of the test designers, due to the complex interplay of forces and factors both within and beyond the school. Such influences were linked to test validity by Shohamy (1993a), who pointed out that “the need to include aspects of test use in construct validation originates in the fact that testing is not an isolated event; rather, it is connected to a whole set of variables that interact in the educational process” (p. 2). Similarly, Linn (1992) encouraged the measurement research community “to make the case that the introduction of any new high-stakes examination system should pay greater attention to investigations of both the intended and unintended consequences of the system than was typical of previous test-based reform efforts” (p. 29). As a result of this complexity, Messick (1989) recommended a unified validity concept, which requires that when an assessment model is designed to make inferences about a certain construct, the inferences drawn from that model should not only derive from test score interpretation but also from other variables operating within the social context. The importance of collaboration was also highlighted by Messick (1975): “Researchers, other educators, and policy makers must work together to develop means of evaluating educational effectiveness that accurately represent a school or district’s progress toward a broad range of important educational goals. Wall (1996) stressed the difficulties in finding explanations of how tests exert influence on teaching (p. 334). Wall (1999, 2000) used the innovation literature and incorporated findings from this literature into her research areas to propose ways of exploring the complex aspect of washback: The writing of detailed baseline studies to identify important characteristics in the target system and the environment, including an analysis of current testing practices (Shohamy et al., 1996), current teaching practices, resources (Bailey, 1996; Stevenson & Riewe, 1981), and attitudes of key stakeholders (Bailey, 1996; Hughes, 1993). The formation of management teams representing all the important interest groups, for example, teachers, teacher trainers, university specialists, ministry officials, parents and learners, etc. (Cheng, 1998a). Fullan with Stiegelbauer (1991) and Fullan (1993), also in the context of innovation and change, discussed changes in schools, and identified two main recurring themes: _ Innovation should be seen as a process rather than as an event. _ All the participants who are affected by an innovation have to find their own “meaning” for the change. Fullan explained that the “subjective reality” which teachers’ experience would always contrast with the “objective reality” that the proponents of change had originally imagined. According to Fullan, teachers work on their own, with little reference to experts or consultation with colleagues. They are forced to make on-the-spot decisions, with little time to reflect on better solutions. They are pressured to accomplish a great deal, but are given far too little time to achieve their goals. When, on top of this, they are expected to carry forward an innovation that is generally not of their own making, their lives can become very difficult indeed. This may help to explain why intended washback does or does not occur in teaching and learning. If educational change is imposed upon those parties most directly affected by the change, that is, learners and teachers, without consultation of those parties, resistance is likely to be the natural response (Curtis, 2000). In addition, it has also been found that there tend to be discrepancies between the intention of any innovation or curriculum change and the understanding of teachers who are tasked with the job of implementing that change (Andrews, 1994, 1995; Markee, 1997). Andrews (1994, 1995) highlighted the complexity of the relationship between washback and curriculum innovation, and summarized three possible responses of educators in response to washback: fight it, ignore it, or use it (see also Andrew’s chap. 3 in this volume; Heyneman, 1987, p. 260). By “fight it,” Heyneman referred to the effort to replace examinations with other sorts of selection processes and criteria, on the grounds that examinations have encouraged rote memorization at the expense of more desirable educational practices. In terms of “ignoring it,” Andrews (1994) used the metaphor of the ostrich pretending that on-coming danger does not really exist by hiding its head in the sand (pp. 51–52). According to Andrews, those who are involved with mainstream activities, such as syllabus design, material writing, and teacher training, view testers as a “special breed” using an obscure and arcane terminology. Tests and exams have been seen as an occasional necessary evil, a dose of unpleasant medicine, the taste of which should be washed away as quickly as possible. The third response, “use it,” is now perhaps the most common of the three, and using washback to promote particular pedagogical goals is now a well-established approach in education. The question of who it is that uses it relates, at least in part, to the earlier discussion of the legislative power of tests as perceived by governments and policymakers in many parts of the world.